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CN ■ Abstract 
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i Scheduling is a critical and challenging resource allocation mechanism for multi-hop wireless net- 

' works. It is well known that scheduling schemes that give a higher priority to the link with larger 

queue length can achieve high throughput performance. However, this queue-length-based approach could 
potentially suffer from large (even infinite) packet delays due to the well-known last packet problem, 
whereby packets may get excessively delayed due to lack of subsequent packet arrivals. Delay-based 
schemes have the potential to resolve this last packet problem by scheduling the link based on the 
delay for the packet has encountered. However, the throughput performance of delay-based schemes 
has largely been an open problem except in limited cases of single-hop networks. In this paper, we 



(N 

> , 

. investigate delay-based scheduling schemes for multi-hop traffic scenarios. We view packet delays from 

■ a different perspective, and develop a scheduling scheme based on a new delay metric. Through rigorous 



analysis, we show that the proposed scheme achieves the optimal throughput performance. Finally, we 
conduct extensive simulations to support our analytical results, and show that the delay-based scheduler 
successfully removes excessive packet delays, while it achieves the same throughput region as the queue- 
length-based scheme. 

I. Introduction 

Link scheduling is a critical resource allocation component in multi-hop wireless networks, and also 
perhaps the most challenging. The celebrated Queue-length-based Back-Pressure (Q-BP) scheduler |[T1 
has been shown to be throughput-optimal and can stabilize the network under any feasible load. Since 
the development of Q-BP, there have been numerous extensions that have integrated it in an overall 
optimal cross-layer solution. Further, easier-to-implement queue-length-based scheduling schemes have 
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been developed and shown to be throughput-efficient (see HI and references therein). Some recent attempts 
El-im focus on designing real-world wireless protocols using the ideas behind these algorithms. 

While these queue-length-based schedulers have been shown to achieve excellent throughput perfor- 
mance, they are usually evaluated under the assumption that flows have an infinite amount of data and 
keep injecting packets into the network. However, in practice accounting for multiple time scales |[6l-|[8l. 
there also exist other types of flows that have a finite number of packets to transmit, which can result 
in the well-known last packet problem: consider a queue that holds the last packet of a flow, then the 
packet does not see any subsequent packet arrivals, and thus the queue length remains very small and the 
link may be starved for a long time, since the queue-length-based schemes give a higher priority to links 
with a larger queue length. In such a scenario, it has also been shown in ||6l that the queue-length-based 
schemes may not even be throughput-optimal. 

Recent works in |[9l- |[T4l have studied the performance of delay-based scheduling algorithms that 
use the Head-of-Line (HOL) delay instead of queue length as link weight. One desirable property of the 
delay-based approach is that they provide an intuitive way around the last packet problem. The schedulers 
give a higher priority to the links with a larger weight as before, but the weight (i.e., the HOL delay) 
of a link increases with time until the link gets scheduled. Hence, if the link with the last packet is not 
scheduled at this moment, it is more likely to be scheduled in the next time. However, the throughput 
of the delay-based scheduling schemes is not fully understood, and has merely been shown for limited 
cases of single-hop networks. 

The delay-based approach was introduced in Q for scheduling in Input-Queued switches. The results 
have been extended to wireless networks for single-hop traffic, providing throughput-optimal delay- 
based Max Weight scheduling algorithms lOTI . |[T2l . |[T5l . It is also shown that delay -based schemes with 
appropriately chosen weight parameters also provide good Quality of Service (QoS) |[T0|| . and can be used 
as an important component in a cross-layer protocol design lfT4ll . The performance of the delay-based 
Max Weight scheduler has been further investigated in a single-hop network with flow dynamics liT3l . 
The results show that, when flows arrive at the base station carrying a finite amount of data, the delay- 
based Max Weight scheduler achieves the optimal throughput performance while its queue-length-based 
counterpart does not. 

However, in multi-hop wireless networks, the throughput performance of these delay-based schemes has 
largely been an open problem. To the best of our knowledge, there are no prior works that employ delay- 
based algorithms to address the important issue of throughput-optimal scheduling in multi-hop wireless 
networks. The problem turns out to be far more challenging in the multi-hop scenario due to the following 



November 30, 2010 



DRAFT 



3 



reason. In |[T2l . the key idea in showing throughput optimality of the delay-based Max Weight scheduler 
is to exploit the following property: after a finite time, there exists a linear relation between queue lengths 
and HOL delays, where the ratio is the mean arrival rate. Hence, the delay-based MaxWeight scheme 
is basically equivalent to its queue-length-based counterpart, and thus achieves the optimal throughput. 
This property holds for the single-hop traffic, since given that the exogenous arrival processes follow 
the Strong Law of Large Numbers (SLLN) and the fluid limits exist, the arrival processes turn out to be 
deterministic processes with constant rates in the fluid limit model. However, such a linear relation does 
not necessarily hold for the multi-hop traffic, since the packet arrival rate at a non-source node (or a relay 
node) is not a constant and depends on the underlying scheduler's dynamics. To this end, we investigate 
delay-based scheduling schemes that achieve the optimal throughput performance in multi-hop wireless 
networks. 

UnUke previous delay-based schemes, we view packet delay as a sojourn time in the network, and 
re-design the delay metric of a queue as the delay difference between the queue's HOL packet and the 
HOL packet of its previous hop (see Eq. (l44l) for the formal definition). Using this new metric, we can 
establish a linear relation between queue lengths and delays in the fluid limit model. Then the linear 
relation plays the key role in showing that the proposed Delay-based Back-Pressure (D-BP) scheduling 
scheme is throughput-optimal in multi-hop networks. 

In summary, the main contributions of our paper are as follows: 

• We re-visit throughput optimality of Q-BP using fluid limit techniques. Throughput optimality of 
Q-BP has been originally shown using the standard Lyapunov technique in a stochastic sense. We 
re-derive throughput optimality of Q-BP itself using fluid limit techniques so that we can extend 
the analysis to D-BP using the linear relation between queue lengths and delays in the fluid limit 
model. 

• We devise a new delay metric for D-BP and show that it achieves optimal throughput performance 
in multi-hop wireless networks. Calculating a link weight as sojourn time difference of the HOL 
packet, we establish a linear relation between queue lengths and delays in the fluid limit model, 
which leads to throughput-optimality of D-BP following the same analytical procedure of Q-BP. 

• We conduct extensive simulations to evaluate the performance of delay-based schedulers. Through 
simulations, we observe that the last packet problem can cause excessive delays for certain flows 
under Q-BP, while the problem is eliminated under D-BP. Further, in the case of Q-BP, even 
though the average delays experienced in the network may be similar to D-BP, the tail of the 
delay distribution could be substantially longer. We also show that, D-BP can not only achieve the 
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same throughput region as Q-BP, but also guarantee better fairness by scheduling the links based 
on delays and not starving certain flows that lack subsequent packet arrivals (or have very large 
inter-arrival times between groups of packet arrivals). 
The remainder of the paper is organized as follows. In Section Ull we present a detailed description of 
our system model. In Section |lllj we show throughput optimality of Q-BP using fluid limit techniques, 
and extend the analysis to D-BP in Section |IVl We evaluate the performance of delay-based schedulers 
through simulations in Section |Vl and conclude our paper in Section |Vll 

II. System Model 

We consider a multi-hop wireless network described by a directed graph Q = (V, 8), where V denotes 
the set of nodes and £ denotes the set of links. Nodes are wireless transmitters/receivers and links are 
wireless channels between two nodes if they can directly communicate with each other. During a single 
time slot, multiple links that do not interfere can be active at the same time, and each active link transmits 
one packet during the time slot if its queue is not empty. Let S denote the set of flows in the network. 
We assume that each flow has a single, fixed, and loop-free route. The route of flow s has an if(s)-hop 
length from the source to the destination, where each fc-th hop link is denoted by {s,k). Note that the 
assumption of single route and unit capacity is for ease of exposition, and one can easily extend the 
results to more general networks with multiple fixed routes and heterogeneous capacities. To specify 
wireless interference, we consider the fe-th hop of each flow s or link-flow-pair (s, k). Let V denote the 
set of all link-flow -pairs, i.e., 

V = {{s, A;) I s e 5, l<k< H{s)}. (1) 

The set of link-flow-pairs that interfere with (s, k) can be described as 

I{s,k) = {{r,j) € V I {s,k) interferes with (r, j), 

(2) 

or (r,j) = {s,k)}. 

Note that the interference model we adopt is very general. A schedule is a set of (active or inactive) 
link-flow-pairs, and can be represented by a vector M £ {0, 1}'^', where each link-flow-pair is set to 1 
if it is active, and if it is inactive, and | • j denotes the cardinality of a set. A schedule M is said to be 
feasible if no two link-flow-pairs of M interfere with each other, i.e., (r, j) ^ I{s, k) for all (r, j), (s, k) 
with Mrj = 1 and ^ = 1. Let M-p denote the set of all feasible schedules in V, and let Co(A4-p) 
denote its convex hull. 
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Let As{t) denote the number of packet arrivals at the source node of flow s at time slot t. We assume 
that the packet arrival processes satisfy the Strong Law of Large Numbers (SLLN): with probability 1, 

lim ^r=U^^r) ^ 

t— ^oo t 

for all flow s £ S, and their fluid limits exist |[T6l . We call As the arrival rate of flow s, and let 
A = [Ai, A2, • • • 1 \s\\ denote its vector. Assumption (|3]) on arrival processes is mild. It is satisfied, for 
example, when the number of arrivals at each time slot is i.i.d across time with mean rates A. 

Let Qs,k{t) denote the number of packets at the queue of {s,k) at the beginning of time slot t. 
Slightly abusing the notation, we also use Qs^fc to denote the queue. We denote the queue length vector 
at time slot t by Q{t) = [Qs,k{t)^ {s,k) G V], and use || • || to denote the Li-norm of a vector, e.g., 
= fc)GP Qs,k{t)- Let lis denote the service of Qs,k at time slot t, which takes either I if 
link-flow -pair {s,k) is active, or 0, otherwise, in our settings. We denote the actual number of packets 
transmitted from Qs^k at time slot t by ^(i). Clearly, we have "i>s,k{t) < 11^ ^(i) for all time slots 
t > 0. Let Ps,k{'t) — Yli=iQs,i{t) denote the summed queue length of queues up to the A;-th hop for 
flow s. By setting 

Qs,H{s)+l = 0, (4) 

we have 

Ps,H{s)+l = Ps,H{s)- (5) 

The queue length evolves according to the following equations: 

Qs,k{i + 1) = Qs,k{i) + - (6) 

where we set ^s,o(i) = As{t)- 

Let Fs{t) be the total number of packets that arrive at the source node of flow s until time slot t > 0, 
including those present at time slot 0, and let Fs^f^{t) be the total number of packets that are served at 
Qs,k until time slot t > 0. We by convention set Fs,A,(0) = for all link-flow-pairs {s,k) € V. We let 
Zs,k,i{t) denote the sojourn time of the i-th packet of Qs,k in the network at time slot t, where the time 
is measured from when the packet arrives in the network (i.e., when the packet arrives at the source 
node), and let Ws^k{i) denote the sojourn time of the Head-of-Line (HOL) packet of Qs,k in the network 
at time slot t, i.e., Ws^kii) = Zs^k,i{t). We set 

Ws,o{t) = (7) 
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and 

Ws,His)+l{t) = Ws,His){t), (8) 

for all s G 5. Further, if Qs,k{i) = 0, we set 

Ws,k{t) = Ws,k^i{t). (9) 

Letting Us,k{t) = t — Ws^k{t) denote the time when the HOL packet of Qs,k arrives in the network, we 
have that 

U,^k{t) = inf{r < t I Fs{t) > for all t > 0. (10) 

We next define the stability of a network as follows. 
Definition 1: A network of queues is said to be stable if, 

1 

limsup- VE[||Q(r)||] <«). (11) 

r=0 

We define the throughput region of a scheduling policy as the set of rates, for which the network 
remains stable under this policy. Further, we define the optimal throughput region (or stability region) as 
the union of the throughput regions of all possible scheduling policies. The optimal throughput region 
A* can be presented as 

A* = {A I 3,^ G Co{Mv) s.t. Xs < (l)s,k, for all (s, k) G V}. (12) 

An arrival rate vector is strictly inside A*, if the inequalities above are all strict. 

III. Throughput Optimality of Q-BP Using Fluid Limits 

It has been shown in m that Q-BP stabilizes the network for any feasible arrival rate vector using 
stochastic Lyapunov techniques. Specifically, we can use a quadratic-form Lyapunov function to show 
that the function has a negative drift under Q-BP when queue lengths are large enough. In this section, 
we re-visit throughput optimality of Q-BP using fluid limit techniques. The analysis will be extended 
later to prove throughput optimality of the delay-based back-pressure algorithm. 

To begin with, we define the queue differential AQs k{t) as 

AQs,k{t) = Qs,k{i)-Qs,k+i{t), (13) 

and specify the back-pressure algorithm based on queue lengths as follows. 
Queue-length-based Back-Pressure (Q-BP) algorithm: 

M* G argmax ^ AQ,,fe(t) • M,,fc. (14) 
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The algorithm needs to solve a MaxWeight problem with weights as queue differentials, and ties can be 
broken arbitrarily if there are more than one schedules that have the largest weight sum. 

We establish the fluid limits of the system and prove throughput optimality of Q-BP using fluid limit 
techniques. 

A. Fluid limits 

We define the process describing the behavior of the underlying system as ^ = {X{t),t = 0, 1, 2, • • • ), 
where 

X{t) ^ {{Zs,k,i{t), • • • , Z,^k,Q.At) (*))' k)eV). (15) 

The evolution of X forms a discrete time Markov chain, if a scheduling decision is based on the 
information of the current time slot only. It is clear that X forms a Markov chain under Q-BP. Motivated 
by Definition [U we define the norm of X{t) as 

\\xm ^ \\Q{t)\\. (16) 

Let A^^^") denote a process X with an initial configuration such that 

||<^("")(0)||=x„. (17) 

All the processes of X^^"^ satisfy the properties in the original system X. 

The following Lemma was derived in ifTTl for continuous time countable Markov chains, and it follows 
from more general results in ifTSl for discrete time countable Markov chains. 

Lemma 1: Suppose there exists an integer T > such that, for any sequence of processes {Af^^")}, 
we have that, 

lim E —\\X^'^-\xnT)\\ =0, (18) 

then the system is stable. 

A stability criteria of ([TSl l leads to a fluid limit approach |[T6l . |[T9l to the stability problem of queueing 
systems. Hence, we start our analysis by establishing the fluid limit model as in |[T2l . |[T6l . We define 
the process 

y^(A,F,F,Q,P,J\,^,W,u], (19) 



and it is clear that, a sample path of y uniquely defines the sample path of X. Then we extend the 
definition of y = ^, F, F, Q, P, U, ^, W and U to continuous time domain as Y(t) = Y{[t\) for 
each continuous time t > 0. Note that, Y{t) is right continuous having left limits. 
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As in |[T2l . we extend the definition of Fs^"\t) to the negative interval t S [—Xn, 0) by assuming that 
the packets present in the initial state ^'^^"^(0) arrived in the past at some of the time instants —{xn — 
1)1 — (^^n — 2), • • • ,0, according to their delays in the state <Y(^")(0). By this convention, Fs^"\—Xn) = 
for all s G 5 and Xn, and 

Y^F^''"\0) = Xn, (20) 

for all Xn- 

Then, using the techniques of Theorem 4.1 of |fT6l . we can show that, for almost all sample paths and 
for all positive sequence j;„ — > 00, there exists a subsequence Xn^ with Xn^ — )• 00 such that, for all s G 5 
and all {s,k) G V, the following convergences hold uniformly over compact (u.o.c) interval: 

^ j;^"^' /;"^\r)dT ^ Xst, (21) 

nj 

^Fh\xn,t) ^ fs{t), (22) 

rij 

^F'f;^\^n,t)^fs,k{t\ (23) 

3 

^Q^k'\^n,t)^qs,k{t), (24) 

nj 1 

^P^l"^\xn^t) ^ Ps,k{t), (25) 
_^ j;-^* nij^ V)dT ^ £ 7r,,,(r)(ir, (26) 

J 

^ ^ /o ^^,'^(^)^^' (27) 

and similarly, the following convergences (which are denoted by "=^") hold at every continuous point of 
the limit function: 

^W^l"^\xn,t)^ws,k{t), (28) 

nj 5 

^ull"'\xn,t)^us,k{t). (29) 
At almost all points t G [0, 00), the derivatives of these limit functions exist. We call such points regular 
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time. Moreover, the limits satisfy that 

EsesW) = '^^ (30) 

Ps,k{i) = Ei=i<ls,i{t), (31) 

Ps,kii) = fsit)- fs,kit), (32) 

fs{t) = fs{0) + Xst, (33) 

Us,kit) = t - Ws,k{t), (34) 

'>Ps,k{t) < vr^,fe(i), (35) 

^QsA't) = QsM't) - Qs,k+i{t), (36) 

d u\ 1 i^s,k-i{t) -TTs,k(.t), qs,k{t)>0, 

Tt1s,k{t) = I (37) 
[ [tps,k-l{t) - T^s,k{t)\^, qs,k{t) = 0, 

where [z\^ = max(z,0), and we set if^sfl = -Kgfi = A^. 

It is clear from (fT4l) that Q-BP will not schedule link-flow-pair {s, k) if Qs,k{t) — Qs,k+i{t) < 0. This 
implies that, if 

Qs,k{i) > Qs,k+i{t) - 2 (38) 

initially holds for all (s, k) at time slot 0, then the inequality holds for every time slot t > 0. This further 
implies that 

qs,k{t) > qs,k+i{t), i.e., Aqs^kit) > 0, (39) 

for all (scaled) time t > 0. Without loss of generality, we assume that, at time slot 0, all queues on each 
route are empty, except for the first queue, then it follows that (1381 ) holds for all (scaled) time t > 0, and 
thus, Aqs±{t) > holds, for all t > 0. 

B. Throughput optimality of Q-BP 

Proposition 2: Q-BP can support any traffic with arrival rate vector that is strictly inside A*. 

Proof: We prove the stability using the standard Lyapunov technique. We consider a quadratic-form 
Lyapunov function in the fluid limit model of the system, and show that it has a negative drift, which 
implies that the fluid limit model and thus the original system is stable. 

Let V{q{t)) denote the Lyapunov function defined as 

V{m) = \ E i^sAi)?. (40) 

{s,k)&V 
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Suppose A is strictly inside A*, then there exists a vector 4'{t) € Co{M--p) such that A < </>(t), 
i.e., As < 4>s,k{t) for all {s,k) € V. Since q{t) is differentiable, for any regular time t > such that 
V{q{t)) > 0, we can obtain the derivative of V{q{t)) as 

is,k)eP 

= ^ Ms,k{t) ■ Xs- ^ Ms,k{t) ■ TTs,k{t) 
(s,k)£V (s,k)&V 

= Ms,k{t) ■ {Xs - (ps,kit)) 

{s,k)£V 

(s,k)ev 

where £t^(?(0) = lim(54.o ^ and the first equality and the inequality are from (l37l) and 

(|35] ). respectively. 

Note that in the final result of (|4TI ). we obtain that i) the first term is negative because i) A < (/){t) 
and Aqs^kit) ^ for all (s, k), and that ii) the second term becomes non-positive since Q-BP chooses 
schedules that maximize the queue differential weight sum ([141 ). its fluid limit 7f(t) satisfies that 

TT{t) € argmax ^ Aq^^kit) ■ 4's,k, (42) 
<P€Co{Mv) {s,k)&V 

which implies that 

Aqs,k{t) ■ <Ps,k{t) < Yl ^1s,k{i) ■ ^s,k{t), (43) 
is,k)eP is,k)eV 

for all (j){t) G Co{A4'p). Therefore, we have ^V{q{t)) < and the fluid limit model of the system is 
stable, which implies that the original system is also stable by Theorem 4.2 of |fT6ll . ■ 

IV. Throughput Optimality of D-BP 

A. Algorithm description 

Next, we develop Delay-based Back-Pressure (D-BP) policy that can establish a linear relation between 
queue lengths and delays in the fluid limit model. The idea has appeared first in |[T2l for single-hop 
networks. However, when packets travel multiple hops before leaving the system, the analytical approach 
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in |[T2l (i.e., using HOL delay in the queue as the metric) cannot capture queueing dynamics of multi-hop 
traffic and the resultant solutions cannot guarantee the linear relation. This is because the arrival rate 
of a relay node is not a constant and depends on the system dynamics (i.e., depends on the underlying 
scheduling policies). In this section, we carefully design link weights using a new delay metric, and 
re-establish the linear relation between queue lengths and delays under multi-hop traffic. 

Recall that Ws.k{t) denotes the sojourn time of the HOL packet of queue Qs,k{t) in the network, 
where the time is measured from when the packet arrives in the network. We define the delay metric 
Ws^kit) as 

Ws,k{t) = Ws,k{t) - Ws,k~i{t), (44) 

and also define delay differential as 

^Ws,k{t) = Ws,k{t) - Ws,k+i{t). (45) 

The relations between these delay metrics are illustrated in Fig. [T] We specify the back-pressure algorithm 
with the new delay metric as follows. 
Delay-based Back-Pressure (D-BP) algorithm: 

M* G argmax ^ A#,,fc(t) • M,,^. (46) 

M&Mv {s,k)eV 

D-BP computes the weight of (s, k) as the delay differential AW^ ^(t) and solves the Max Weight problem, 
i.e., finds a set of non-interfering link-flow-pairs that maximizes weight sum. Ties can be broken arbitrarily 
if there are more than one schedules that have the largest weight sum. An intuitive interpretation of the 
new delay metric Wg^kit) is as follows. Note that the queue length Qs,k{t) is roughly the number of 
packets arriving at the source of flow s during the time slots between [Us^k{'t),Us^k{t) + Ws^k{t)), and 
Qs,k{'t) is in the order of XsWs^kii) when Ws^ki^) is large. Hence, a large Wg^kit) implies a large queue 
length Qs.k{t), and similarly, a large delay differential l^Ws^k{t) implies a large queue length differential 
A(5s,fc(i)- Therefore, being favorable to the delay weight sum in (l46l ) is in some sense "equivalent" to 
being favorable to the queue length weight sum in (fT4l) as Q-BP. We later formally establish the linear 
relation between the fluid limits of queue lengths and delays in Section IIV-BI 
Clearly, D-BP also will not schedule link-fiow-pair (s, k) if 

Ws,k{t) - Ws,k+i{t) < 0. (47) 
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Flow s 



{s,k-1) ^ {s,k) ^ {s,k+1) \ 

WsMt)=WsMt)-Ws,,.i(t) 

Ws,,.i(t) = Ws,,.i(t)-Ws,,(t) 

AWsMt)=W,Mt)-WsMi(t) 



Fig. 1. Delay differentials using new delay metric. 



Let Bs^k{t) denote the inter-arrival time between the HOL packet of Qs,k{t) and the packet that arrives 
immediately after it. The aforementioned operation of D-BP implies that, if inequality 

WsAt) > Ws,k+i{t) - 2Bs,k{t), (48) 

initially holds for all (s, k) at time slot 0, then the inequality holds for all time slot t > 0. This further 
leads to 

Ws,k{t) > Ws,k+i{t), i.e., Aws,k{t) > 0, (49) 

for all (scaled) time t > 0, in the fluid limits, since -r—B^ ^"^ {xn^t) 0, as Xn^ oo, otherwise we 
will arrive a contradiction to the fact that the arrival process satisfies the Strong Law of Large Numbers. 
Recall that we assume that all queues on each route are empty, except for the first queue at time slot 0, 
then dull and (EUl foUow. 

B. Analysis of throughput performance 

We first establish the linear relation between the fluid limits of queue lengths and delays in the following 
lemma. We will use the lemma later to show that D-BP achieves the optimal throughput. 

Lemma 3: For any fixed t^.fe > 0, for any link-flow-pair (s, k) G V, the two conditions Us^k{'ts,k) > 
and fs,k{ts,k) > /s(0) are equivalent. Further, if these conditions hold, we have 

Ps,k{t) = XsWs,k{t), (50) 

qs,k{t) = XsWs,k{t), (51) 

for all t > ts^k, with probability 1. 

Fig. [2] describes the relations between the variables. 

Proof: Since the first part, i.e., the two conditions are equivalent, is straightforward from the definition 
of fluid limits and ( fTOl ). we focus on the second part, i.e., if fs,k{'ts,k) > /s(0), then dSOl ) and dSTI ) follow. 
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Suppose that 

fs,k{ts,k) > fs{0). (52) 

Then, by definition of Ug^kit), we have 

fsAt) = fs{us,k{t)), (53) 
for all t>ts,k- From ([32]), and we obtain that 

Ps,k{t) = fs{t) - fs,k{t) 

= if siO) + Xst) - if M + XsUs,k{t)) 

(54) 

= Xs- {t- Us,k{t)) 
= XsWs,k{t)- 

Further, (|5TI ) follows from (|3TI ) and the fluid limit version of (l44l) . ■ 
We emphasize the importance of (|5T]) . Lemma [3] implies that after a finite time (i.e., m.ax(^g k)e'pts,k), 

queue lengths are As times delays in the fluid limit model. Then the schedules of D-BP are very similar 

to those of Q-BP, which implies that D-BP achieves the optimal throughput region A*. In the following, 

we show that such a finite time exists. 

Lemma 4: Consider a system under the D-BP policy. For A strictly inside A* , there exists a time T > 

such that the fluid limits satisfy the following property with probability 1, 

UkiT) > /s(0), (55) 

for all link-flow-pairs {s,k) € V. 
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We can prove Lemma |4] by induction following the techniques described in Lemma 7 of (T2\ . The 
formal proof is provided in Appendix El We next outline an informal discussion, which highlights the 
main idea of the proof. First, we consider the base case. D-BP chooses one of the feasible schedules in 
(we omit the term "feasible" in the following, whenever there is no confusion) at each time slot. 
Each schedule receives a fraction of the total time and there must exist a schedule that gets at least 
fraction of the total time. Thus, after a large enough time Ti > 0, there must exist a schedule M* that 
is chosen for at least amount of time. The number of initial packets of M* is bounded from (l30l ). 
thus, for a large enough Ti, all initial packets of at least one link-flow-pair of M* must be completely 
served, i.e., fs,k{Ti) > /s(0), for at least one (s, k) with M*^. = 1. 

Next, we consider the inductive step. Suppose there exists a T/ > 0, such that for at least one subset 
Si CV of cardinality I, we have 

fs,k{Ti) > fs{0), (56) 
for all (s, k) G Si. Then there exists T^+i > Ti such that 

fs,km+i) > /.(O), (57) 

holds for all link-flow-pairs (s, k) within at least one subset S'^+i cV of cardinality I + 1. Note that, if 
(s, A;) G 5;, then (s, i) e Si for 1 < i < k. Let 

S! = {(r, j) I (r, j) ^ Si, {r,j - 1) G Si, for j > 1; 

(58) 

or (r,j) ^ Si, for j = 1} 

denote the set of link-flow -pairs (r, j) such that (r, j) G ■P\S'; is the closest hop to the source of r. To 
avoid unnecessary complications, we discuss the induction step for / = 1. The generalization for I > 1 
is straightforward. We show that for given Si and Ti, there exists a finite T2 > Ti such that (l57l) with 
T2 holds for at least two different link-flow-pairs. 

Let (s, k) denote the link-flow-pair that satisfies ( [56l ) with Ti. Since {s, k) G Si implies (s, i) G Si for 
all 1 < i < fc, we must have k = 1 and = {{s, 1)}. From (l58l) . we have that 

SI = {{r, 1) I r G S\{s}} U N^, (59) 

where N§ = {{s,2)} if H{s) > 1, and iV^ = if H{s) = 1. We discuss only the case that H{s) > 1, 
and the other case can be easily shown following the same line of analysis. Now suppose that 

frAt) < MO), for aU (r, j) G V\Si, and aU t > 0, (60) 
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i.e., for all the link-flow-pairs except those of ^i, the total amount of service up to time t is no greater 
than the amount of the initial packets for all t > 0. We show that this assumption leads to a contradiction, 
which completes the inductive step, and we prove the lemma. 

From the base case and Lemma [3l we have qs,i{t) = XsWs,i{t) for all t > Ti. We view the subset 
of links 5i as a generalized system, and consider the time slots when there is at least one packet 
transmission from the outside of Si, i.e., (r, j) G VXSi. For each of such time slot, we say that the time 
slot is unavailable to Si. 

1) The number of such unavailable time slots is bounded from the above by Xn^, since at every such 
time slot, at least one initial packet will be transmitted and the total number of initial packets is 
bounded by ||(5(0)|| = from ( fTTl ). Hence, the amount of (scaled) time unavailable to Si is 
bounded by ||g(0)|| = 1. 

2) Since the amount of (scaled) time unavailable to Si is bounded, there exists a sufficiently large 
t >Ti such that the fraction of time that is given to (r, j) G V\Si is negligible, and we must have 
w,;j{t) = e(l)' and Aw,j(t) = 9(1) for (f,i) G P\(5i U^i*)- 

3) Then, we can restrict our focus on the generalized system Si to time t >Ti, and ignore the time 
that is unavailable to 5i. Then Q-BP and D-BP are in some sense "equivalent" in the generalized 
system for t > Ti with the following properties: First, Q-BP will stabilize the system if the 
arrival rate vector is strictly inside A*. Second, since the linear relation (ISTl ) holds for all link- 
flow-pairs in Si from Lemma [3j D-BP will schedule links similar to Q-BP and also stabilizes the 
generalized system ^i. 

4) Now let us focus on Si- Link- flow-pairs in Si must have some initial packets at t > Ti because 
5*1 n 5* = 0. On the other hand, the generalized network 5i is stable. This implies that the 
delay metrics of link-flow-pairs in 5* should increase at the same order as we increase t, i.e., 
Wr'j'{t) = Q{t) for (r*,j*) G S^. Then we have Awr'j'{t) = Q{t), since Wr-j-+i{t) = 0(1) 
from {r*,j* + 1) G P\(5'i U'S'i*) and 2). Since the delay differentials Aws,k{t) for all {s,k) G 
Si and Aw~j{t) for all (f,j) G P\(5'i U 5*) are bounded above from stability of and 2), 
respectively, D-BP will choose some of link-flow-pairs in Si for most of time for a sufficiently 
large t. This implies that the amount of time unavailable to is @{t), which conflicts with our 
previous statement that the fraction of time that is given to (r, j) G VXSi is negligible. 

'We use the standard order notation: g{n) — o{f{n)) implies lim„_>oo(g(n)//(n)) = 0; and g{n) = &{f{n)) implies 
Ci < lim„^oo(5('^)//('^)) < C2 for some constants ci and C2. 
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As mentioned earlier, we omit the detailed proof here and refer readers to Appendix [A] 
The following proposition shows throughput optimality of D-BP. 

Proposition 5: D-BP can support any traffic with arrival rate vector that is strictly inside A*. 

Proof: We show the stability using fluid limits and standard Lyapunov techniques. From Lemmas |3] 
and m we obtain the key property for proving throughput optimality of D-BP in Eq. dST]) . i.e., after a 
finite time, there is a linear relation between queue lengths and delays in the fluid limit model. We start 
with the following quadratic-form Lyapunov function, 

vm)) = l E (61) 

Following the line of analysis in the proof of Proposition |2l we can show that the Lyapunov function has 
a negative drift if the underlying scheduler maximizes ^^"^''^^^ ■ vr^ ^(t). Now applying the linear 
relation (ISTl ). we can observe that D-BP satisfies such a condition, and obtain the results. We omit the 
detailed proof. ■ 
Although D-BP operates efficiently and achieves the optimal throughput region, it is difficult to 
implement in practice due to centralized operations and high computational complexity. Therefore, 
we are interested in simpler approximations to D-BP that can achieve a guaranteed fraction of the 
optimal performance. The delay-based greedy maximal algorithm^ is a good candidate algorithm. We 
can characterize the throughput performance of the delay-based greedy scheme combining our results 
along with the techniques used in |[20l . |[2TI . and show that it is as efficient as its queue-length-based 
counterpart, i.e., the queue-length-based greedy maximal algorithm. 

V. Numerical Results 

In this section, we first highlight the last packet problem of the queue-length-based back-pressure 
algorithm. The last packet problem implies that flows that lack packet arrivals at subsequent times 
may experience excessive delay under Q-BP, which is later confirmed in the simulations. We compare 
throughput and delay performance of Q-BP and D-BP in a grid network topology under the 2-hop^ 
interference model. 

greedy maximal algorithm finds its schedule in decreasing order of weight (e.g., queue length or delay) conforming to the 
underlying interference constraints. 

'in the 2-hop interference model, two links within a 2-hop "distance" interfere with each other. Note that the interference 
model (Eq. l[2j) in the problem setup is very general. We consider the 2-hop interference model in the simulations, as it is often 
used to model the ubiquitous IEEE 802.11 DCF (Distributed Coordination Function) wireless networks I22I - I25I . 
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(b) HOL delay of short flow (2 4 6) when A = 3 
Fig. 3. Illustration of the last packet problem under Q-BP. 



We first show the last packet problem of Q-BP through simulations. We observe that several last 
packets of a short flow that carry a finite amount of data may get stuck, which could cause excessive 



delay. We consider a scenario consisting of 7 nodes and 6 links as shown in Fig. |3(a)[ where nodes are 
represented by circles and links are represented by dashed lines with link capacity"*. We assume a time- 
slotted system. We establish three flows: one short flow (2^4—^6) and two long flows (1 2 — 3) 
and (5^6—)- 7). The short flow arrives at the network with a finite amount of packets at time 0, and 
the number of packets follows Poisson distribution with mean rate 10. The long flows have an infinite 

''Unit of link capacity is packets per time slot. 
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(b) Average queue length 
Fig. 4. Performance of scheduling algorithms for multi-hop traffic following Poisson distribution. 



amount of data and keep injecting packets at the source nodes following Poisson distribution with mean 
rate A at each time slot. Numerical calculation shows that the feasible rate under the 2-hop interference 
should satisfy that A < 4.44. We conduct our simulation for 10^ time slots, and plot time traces of HOL 
delay of the short flow when A = 3. Fig. |3(b)| illustrates the results that the delay linearly increases 
with time under Q-BP, which implies that several last packets of the short flow get excessively delayed. 
On the other hand, D-BP succeeds in serving the short flow and keeps the delay close to 0. This also 
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implies that certain flows whose queue lengths do not increase because of lack of future arrivals (or 
whose inter-arrival times between groups of packets are very large) may experience a large delay under 
Q-BP, which will be confirmed in the following simulations. 

Next, we evaluate the throughput of different schedulers in a grid network that consists of 16 nodes 
and 24 links as shown in Fig. |4(a)[ where nodes and links are represented by circles and dashed lines, 
respectively, with link capacity. We establish 9 multi-hop flows that are represented by arrows. Let 
Ai = 0.1 and A2 = 1. At each time slot, there is a file arrival with probability p = 0.01 for flow 
(11 — 10 9) (represented by the red thick arrow in Fig. |4(a)[ ), and the file size follows Poisson 
distribution with mean rate^ P^i/p- Note that flow (11 — )■ 10 — )■ 9) has bursty arrivals with a small mean 
rate (we simply call it the bursty flow in the following part). All the other 8 flows have packet arrivals 
following Poisson distribution with mean rate /3A2 at each time slot. Although these flows share the same 
stochastic property with an identical mean arrival rate pA2, uniform patterns of traffic are avoided by 
carefully setting the link capacities differently and placing the flows with different number of hops in an 
asymmetric manner. 

We evaluate the scheduling performance by measuring average total queue lengths in the network over 
time. Fig. |4(b)| illustrates average queue lengths under different offered loads to examine the performance 
limits of scheduling schemes. Each result represents an average of 10 simulation runs with independent 
stochastic arrivals, where each run lasts for 10^ time slots. Since the optimal throughput region is defined 
as the set of arrival rates under which queue lengths remain finite (see Definition [D, we can consider the 
traffic load, under which the queue length increases rapidly, as the boundary of the optimal throughput 
region. Fig. |4(b)| shows that D-BP achieves the same throughput region as Q-BP, thus supports the 
theoretical results of throughput optimality. 

Although Q-BP and D-BP perform similarly in terms of average queue length (or average delay) over 
the network, the tail of the delay distribution of Q-BP could be substantially longer because certain flows 
are starved. This could cause enormous unfairness between flows, resulting in very poor QoS for certain 
flows. Note that although a bursty flow is a long flow that has an infinite amount of data, the arrivals 
occur in a dispersed manner (i.e., the inter-arrival times between groups of packets are very large) and 
we can view this bursty flow as consisting of many short flows. Thus, we expect that the bursty flow may 

'Note that given the network topology, it is hard to find the exact boundary of the optimal throughput region of scheduling 
policies in a closed form. Hence, we probe the boundary by scaling the amount of traffic. After we choose A, which determines 
the direction of traffic load vector, we run our simulations with traffic load pA changing p, which scales the traffic loads. 
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Fig. 5. Delay distribution of tlie bursty flow under p = 0.2. 
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Fig. 6. Mean delay, top 1% and top 5% largest delay of the bursty flow over offered loads. 



experience a very large delay under Q-BP due to lack of subsequent packet arrivals over long periods 
of time that does not allow the queue-lengths to grow and thus contributes to the long tail of the delay 
distribution. However, this phenomenon may not manifest itself in terms of a higher average delay for 
Q-BP, as can be observed in Fig. |4(b)[ because the amount of data corresponding to the bursty flow in 
the simulation is small compared to the other flows. On the other hand, D-BP can achieve better fairness 
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by scheduling the Unks based on delays and not starving bursty or variable flows. We confirm this in the 
following observations. 

We now illustrate the effectiveness of using D-BP over Q-BP in terms of how each scheme affects 
the delay distribution of bursty flows. We plot the delay distribution of the bursty flow in Fig. |5] under 
p = 0.2. It reveals that the tail of the delay distribution under D-BP vanishes much faster than Q-BP. 
Further, we plot the mean delay, top^ 1% and top 5% largest delays of the bursty flow over offered loads 
in Fig. [6l All these delays under D-BP are substantially less than under Q-BP, which implies that D-BP 
successfully eliminates the excessive packet delays. The top 0.1% largest delays of the whole network 
demonstrate similar behaviors in Fig. [6] and the results are omitted. This confirms that, Q-BP causes a 
substantially long tail for the delay distribution of the network due to the starvation of the bursty flow, 
while D-BP overcomes this and achieves better fairness among the flows by scheduling the Unks based 
on delays. 

VI. Conclusion 

In this paper, we develop a throughput-optimal delay-based back-pressure scheme for multi-hop wireless 
networks. We introduce a new delay metric suitable for multi-hop traffic and establish a linear relation 
between queue lengths and delays in the fluid limit model, which plays a key role in the performance 
analysis and proof of throughput-optimality. Delay-based schemes provide a simple way around the well- 
known last packet problem that plagues queue-based schedulers, and avoid flow starvation. As a result, 
the excessively long delays that could be experienced by certain flows under queue-based scheduling 
schemes are eUminated without any loss of throughput. 

Appendix A 
Proof of Lemma H] 

Proof: We show that there exists a time T > such that the fluid limits satisfy fs,k{T) > fs{0) for 
all link-flow-pairs {s,k) € V. We prove this by induction. We show that there exists a time T with at 
least one link-flow-pairs that satisfy the condition, and for a given set of link-flow-pairs satisfying the 
condition, at least one additional link-flow-pair will satisfy the condition by increasing T. 

^Suppose there are A'^ packets sorted by their delays from the largest to the smallest, the top XYo largest delay is defined as 
the delay of the [-^^J-th packet. If -^j^ < 1, it means the maximum delay. For example, if the delays are [3, 2, 1, 1, 1], the 
top 20% largest delay is 2. 
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We first fix an arbitrary ei > and define a constant Ki = maxg H{s) + {J2s ^sH{s)) ei. In the fluid 
limit model, we will have 

/,(ei) = /,(0) + A,ei > /,(0), for all s G S. (62) 

Since queue lengths are no greater than the injected amount of data, we have that ps,k{(-i) < /s(ei) for 
all {s, k) e V, and thus. 



<YHis)ifs{0) + Xse,) 



(63) 



< Ki, 

where the last inequality is from Eq. (l30b : ^^sfsiO) = 1 and the definition of Ki. Now we show by 
induction that there exists a finite time T such that 

fs,k{T) > fs{0), for aU Unk-fiow-pairs {s,k). (64) 

Base Case: There exists Ti > such that for at least one link-flow-pair {s, k), 

fsATi) > fsiei). (65) 

Let Ti = ei + Ki/tt* , where vr* is the fraction of time slots between (xn^ei, Xn^Ti] when at least 
one packet is served in the original system. Suppose that (1651 ) does not hold. Then, for all sufficiently 
large Xn^ , we must have 

(^.fe)G7' (66) 
> ■K*Xn^{Ti - ei) + o(x„J, 

where term o{xn^ ) satisfies that £ifl!i2 _). g as Xn^ — )• oo. Dividing both sides of the above inequality by 
Xrij and letting Xn^ oo, we obtain 

E (A,fc(Ti)-A,fc(ei)) >Ki. (67) 
(s,k)ep 



Then, from (1631 ). we have 

{s,k)ep {s,k)ev {s,k)ep 

(68) 

= E 

(s,k)€V 
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Therefore, fs,k{Ti) > fs{^i) for at least one link-flow-pair {s,k). 

Inductive Step: Suppose that there exist Ti and a subset Si CV such that for all (s, k) G Si, we have 

fsMTi) > fsiei). (69) 
Then there exist T/^i > Ti, where 1 < / < H{s), and a link-flow-pair (s, k) G 'P\S'/ such that 

4^(Tz+i) > /,(ei). (70) 

Further we define SiJ^i = SiU {{s, k)}. 

We prove the inductive step for / = 1. The generalization for / > 1 is straightforward. Hence, we 
show that for given and Ti, there exists a finite T2 > Ti such that (TTOI ) with T2 holds for at least two 
different link-flow-pairs. 

Let (s, k) denote the link-flow-pair that satisfies ( [69l ) with Ti. Then, we have ^ 5i = {(s, 1)} and can 
specify the set 5^ of link-flow-pairs (s, k) G 'P\Si that is closest to the source of each flow from ( |59l ). 
We illustrate the case that H{s) > 1, and the other case that H{s) = 1 can be easily shown following 
the same line of analysis. Now we have 

>/s(ei), foraUt>ri. (71) 
For all the other link-flow-pairs, we observe that 

(/r-(ei)-/r,i(Ti)) <ifi. (72) 

Suppose that for all t >Ti, we have 

fr,j{t) < friei), for aU {r,j) G n^i- (73) 

In the following part, we provide a choice of T2 > Ti such that assumption ( 1731 ) leads to a contradiction, 
which completes the inductive step, and then the lemma follows by induction. 

We view each sample path X^^"i\t) after time slot [x„^Ti] as a generalized system with link-flow- 
pairs in Si = {{s, 1)}. We say that a time slot is unavailable to Si when a packet from a link-flow-pair 
(r, j) G V\Si is transmitted during the time slot. Let hs^ (t) denote the (scaled) amount of time unavailable 
to during the period of {Ti,t] in the scaled system, for all t>Ti. For the scaled generalized system 
Si, we obtain from (1721 ) and (1731 ) that 

hs, {t)< Yl {f-'J - {Ti))<Ki, (74) 

(r,j)eP\5i 

^Note that if (s, k) G Si, we must have (s, fc — 1) G Si. Hence, for Z = 1, we must have the first hop of a flow, i.e.. Si = (s, 1) 
for some s. 
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for all t > Ti. Since the time unavailable to is bounded, as time t increases, only link-flow -pairs 
in Si will be scheduled, which implies that the weight of link-flow-pairs of VXSi becomes negligible. 
This allows us to focus on 5i. Owing to Lemma [3] and the definition of Si, the linear relation between 
queue lengths and delays holds for the link- flow -pair in Si. Then, it can be easily shown following the 
same line of analysis of Proposition [5] that link-flow-pairs in Si are stable under D-BP*^. Hence, for all 
(s, k) ^ Si, we have 

qs,k{t)<Ci, foralH>ri, (75) 

and thus 

Ws,k{t) < ^, for aU t>Ti, (76) 

for some constant Ci, which depends on Ti and Ki and does not depend on time t. 

Recall that S^ denotes the set of link-flow-pairs that is closest to the source of each flow out of 5*1 
defined in (48). We choose t large enough such that for all (s, k) G Si and {r*,j*) G S'J, 

From (1731) . there are packets that arrive at the source by time ei and have not been served at j-th hop 
by time t for all (r, j) G V\Si, we obtain that 

t - ei < Wr,j{t) < t, for aU (r, j) G V\Si. (78) 

Since {r*,j*), {r*,j* + 1) G P\Si for {r*,j*) G SJ, we have 

Wr',j'+l{t) = Wr'J'+l{t) - W(^r%j'){'t) < ^l, (79) 

for aU {r*,j*) G S^. From (|78]l, and the fact that (r*,j* - 1) G ^i, we have 

Wr',r{t)>t-ei - (80) 

for all {r*,j*) G S^. Then, we have 

< Ci/As -{t-ei- Ci/Xs) 

< {t-ei-Ci/Xr^)-ei (81) 

(c) 

< Wr',j*{t) - Wr',j'+lit) 

^Note that since Lemmas [3] and |4] hold for the generalized system Si, Proposition [5] can be applied to 5*1. 
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for all (s, k) € Si and {r*,j*) G SI, where (a) is from (1761 ) and (l80l ). (b) is from (1771 ). and (c) is from 
(l80l) and (1791) . Hence, for large t, we have that 

^Ws,k{t) < min {Att;r.j.(t)}. (82) 

(r*,j*)e5i' 

Also, from (1781 ). we have that 

Au>-j(t) < ei, (83) 

for all (f,j) G 7'\(5'i (J 5^"). Since ([831) holds for an arbitrarily small ei and from ([821), D-BP favors 
link-flow-pairs of Si for all large t. Note that Aws,k{t) is bounded for (s,/c) G Si from (1761) . and 
Aw~j{t) is bounded for (f, j) G P\{Si U ^J") from ( [83l ). and Awr^j^it) increases Unearly in order of t 
for (r*, j*) G from dSOl) . Then for large t, link-flow -pairs in S^ will be scheduled most of time under 
the delay-based scheduling scheme. Then we can choose large T2 such that 

hs,iT2)>T2-Ti> Ki. (84) 

However, this contradicts to ( 1741) . which shows that, the assumption ( 1731 ) is false, and there exists a large 
T2 such that 

fs,kiT2) > fsiei), for at least one (s, ~k) G V\Si. (85) 

In fact, our choice of T2 depends on the set Si. However, since there is only a finite number of flows, 
we can always choose large enough T2 so that ( [85] ) holds for some (s, k) G ■P\5i. 

■ 
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